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Abstract 

This article introduces an algorithm for implicit High Dimensional 
Model Representation (HDMR) of the Bellman equation. This ap- 
proximation technique reduces memory demands of the algorithm con- 
siderably. Moreover, we show that HDMR enables fast approximate 
minimization which is essential for evaluation of the Bellman function. 
In each time step, the problem of parametrized HDMR minimization 
is relaxed into trust region problems, all sharing the same matrix. 
Finding its eigenvalue decomposition, we effectively achieve estimates 
of all minima. Their full-domain representation is avoided by HDMR 
and then the same approach is used recursively in the next time step. 
An illustrative example of N-armed bandid problem is included. We 
assume that the newly established connection between approximate 
HDMR minimization and the trust region problem can be beneficial 
also to many other applications. 
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1 Introduction 



The main focus of this article is to develop an approximate tool suitable for 
enlarging the class of computationally feasible decision-making problems. It 
copes with the principal problem within the stochastic dynamic program- 
ming, which is known as the curse of dimensionality, see [1]. The central 
notion of stochastic dynamic programming is the Bellman function, see for 
instance [2]. Once we are able to find and store this function, it is easy to 
derive the optimal strategy. However, the exact calculation of the Bellman 
function is computationally infeasible in the majority of practical applica- 
tions, and also its representation as a lookup-table is intractable. 

Next, we present a survey of approximate solutions to these problems. 
One way to reduce the size of the lookup-table is to aggregate the state 
space of the original problem into smaller sets. As it is not clear how to pick 
the best level of aggregation, several methods of multiple-level aggregation 
are developed [5]. A similar way to lookup-table reduction is approximation 
of the Bellman function which does not require any simplifications in the 
state space. A grid-based approximation with value interpolation is a typical 
example of such method [4j. The Bellman function can also be estimated us- 
ing regression models which are able to exploit specialized structures ("basis 
functions") in the state space [3]. Nonetheless, such methods are suitable for 
maximally hundreds of regression parameters. Another tool suitable for ap- 
proximation is the artificial neural networks utilized to learn the shape of the 
Bellman function, see [6] and references therein. Based on random sampling 
of the state space, a variety of Monte Carlo methods may be also applied, see 
for instance [7] . Temporal Difference methods are of quite a different nature. 
Opposite to the algorithm developed later, they do not operate with system 
model. They use simulated or experience-based sampling of system trajecto- 
ries instead, and thus they have no ambition to cover the whole state space. 
Nonetheless, they definitely do well for many real- world problems [H IH [10] . 

In this article, we develop a new approximate technique which consider- 
ably reduces both computational and memory demands of a decision-making 
problem. To this end, an approximation tool called High Dimensional Model 
Representations (thereinafter "HDMR") is useful [11]. It was applied to 
continuous function approximation in calculating reliability of uncertain me- 
chanical systems [12]. It was also utilized for solution of stochastic partial 
differential equations [13] and compared to Monte Carlo sampling. Another 
application of HDMR was volatility calibration [14] where it was compared 
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to cubic spline approximation. These successful implementations of HDMR 
in other fields encourage us to apply it to approximate dynamic program- 
ming. In the previous applications, it was used mainly for reducing the 
amount of data. The memory space necessary to store all the values of func- 
tion g{xi, . . . , Xd) grows exponentially with the dimension d, whereas the size 
growth of HDMR components is just quadratic in d. This is, of course, im- 
portant even in our case, but the newly established fact that HDMR permits 
fast approximate minimization may be even more essential in the context of 
the decision making theory. 

The outline of this work is as follows. Section [2] deals with the approxi- 
mation technique of HDMR, which is determined by a system of linear equa- 
tions. Its linearity does not match with the inherently non-linear Bellman 
equation. On that account, an algorithm for approximate minimization of 
function having HDMR form is developed in Section [31 Then, the current 
state of the art in the decision making theory is summarized at the begin- 
ning of Section m Next, a viable technique for approximate decision making 
based on HDMR is introduced there, and then the A^-armed bandit problem 
is tackled as an example. Section O is devoted to conclusion. 

Throughout this work, a few general conventions are followed. The do- 
main of the quantity x is denoted X, x G X, |X| denotes the count of 
elements of finite set X. Next, Xm denotes m-th coordinate for vector valued 
quantity x C M*^, x = {xi, . . . ,Xd)- This convention holds with one excep- 
tion: if we use letter t as a subscript, e.g. Xt, it stands for quantity x at the 
time instant t eT with T finite. Next, we reserve letter "/" for conditional 
probability density functions, arguments in the condition are separated by 
"I" in the argument list. For the domain of function h{x) we use dom(/i), 
and HDMR of h{x) is marked by h{x) with several exceptions pointed out 



2 High Dimensional Model Representation 

The approximation technique of HDMR has a particularly simple form. For 
a general function g{x), the second order HDMR g{x) reads 



later. 



g{x) = g{xi,X2, ...,Xd,) 



(1) 



d d-1 



d 
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Here, denotes a constant value over dom{g); one-dimensional functions 
9m{xm) describe independent effects of each particular coordinate x^, and 
two-dimensional functions gmn{xm,Xn) represent the joint effect of coordi- 
nates Xm and Xn- In the context of HDMR, these functions are called zero- 
order, first-order, and second-order components of HDMR, respectively. Ex- 
perience shows that second-order HDMR provides a sufficient approximation 
of g{x) as only low-order correlations amongst the input variables have a 
significant impact upon the outputs of a typical model [121 1131 [II]- 

There are many ways how to construct HDMR pTl [15] . To reduce this 
ambiguity, it is thus necessary to formalize its desired properties. The func- 
tion Hilbert space L'^{X) is a useful concept for the function approximation. 
It is a space of real functions defined over a set X with the finite norm ||5f|| 
defined as follows 




Then, the optimal HDMR of the function g G L'^{X) is defined as a mini- 
mizer of the approximation error 11(7 — ^||. The uniqueness of projection on 
closed subspaces of L'^{X) implies the uniqueness of minimizing function g{x) 
matching this form 

d 1 

g{x) = gm+^gmiXm) + - gmn{x.miXn), (3) 

m=l m,n=l 

where we slightly generalized ([1]) to better fit our needs. Nonetheless, there 
may exist various components g%, (jm and (jmn summing up to the same g. 
Now, let X be (i-dimensional product of finite sets X^ 

d 

4=1 

and let the integration in ([2]) be summation over X. Next, for any subset of 
indices / C {1, . . . , d} we define 

ie{i,...,rf}\/ 

Then, the optimal HDMR of g may be obtained from marginal operators 
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defined for function g as 



M0b] := 5^(7(yi,...,l/d) (6) 

Mm[g]{xm) ■■= ^ givi, ■ ■ ■ Vd) 

The formulae for HDMR components of the optimal g{x) read 

~m ■■= ^M0[^7] (7) 

gra{Xm) ■= T^^y^m[g]{Xm) - % 



mn I 



9mm (-^m ; -^m ) • 



The proposed variant of approximation matches "ANOVA-HDMR" in [TT]. 
From equations ([7]) we observe that identities 



QmiXni) = (8) 

ex 

ex 

hold for all m,n & {1, . . . , d}. In fact, construction ([7]) was intentionally de- 
signed to satisfy ([8]) in order to provide uniqueness of all HDMR components 
[llj. In our setting, however, identities ([8]) play also another important role 
in Section [31 

Finally, we note that this simple construction of HDMR is beneficial to 
our application, as the domain of the Bellman function could be too large to 
operate with all the function values at once. Still, its HDMR components can 
be computed by pointwise evaluation of the function values which are im- 
mediately added to proper sums in Next, we show that such convenient 
form of HDMR may be constructed even in a more general setting. 
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2.1 Weighted HDMR 



A more difficult construction of HDMR may occur in practise if tlie approxi- 
mated function g{x) is defined only on a strict subset of X, dom{g) = R C. X. 
Or, if the full domain X is too large to handle, and thus we search only for 
some approximation to the optimal HDMR, which may be constructed from 
samples of g{x) taken with respect to a smaller set, and so we have x & R C. X 
again. Both these situations may clearly arise in the decision-making theory. 

Under such conditions, it is important not to consider points X \ i? in 
the computation of the approximation error. Thus, instead of ([2]) we have to 
use a weighted norm 

II^IlL / Xr{.x) g{xf dx = / g{xfdx (9) 

J X J R 

with a weight equal to characteristic function 

Xr{.x) '■= 1 for 2; G R, Xr{^) '■= for x ^ R. (10) 

We note that for the case of product weight satisfying w{x) = Y['i=i''^ii^i)j 
the optimal HDMR with respect to may be obtained identically to ([7]), 
see again [11]. This is, however, not possible for an intrinsically non-product 
weight 

Yet, we can directly minimize the approximation error with respect to 
dnD, but instead of component- wise decoupled equations ([7]), we obtain one 
large linear system determining all the optimal HDMR components of g{x), 
see [16]. For smaller problems this system may be computationally feasible; 
however, a more convenient way is to slightly redefine our task. Instead 
of searching for an optimal approximation within the class of all functions 
having HDMR form ([3]), we search for it within a smaller class of such HDMR 
functions that are determined by decoupled formulae as in ([7]). The crucial 
property is the mutual independence of HDMR components: ^0 does not 
depend on any other HDMR component, each gm{xm) depends only on ^0, 
and finally each ^r„„(x„,x„) depends only on gm{xm), gn{xn) and ^0. By 
enforcing only these hierarchical relations we obtain an easier computation 
of HDMR components of g{x) at the price of worse approximation. 

We build such second order HDMR in three steps. First, we compute zero 
order component ^0 in such a way that it minimizes the approximation error 

— ^0!!^^. In the next step we fix this component and find such first or- 
der components gmi^m) that minimize approximation error \\g — g% — ^m|L 
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with respect to (70. Finally, we find second order components gmn{xm,Xn) 
as minimizers of \\g - gn) - (jm - gn - gmn\\^^ with gn,, and ^„(x„) 

kept fixed. The optimality conditions for such HDMR may be derived in 
three steps where each step is analogous to the original derivation of the full 
HDMR [11]. Thus, we obtain the following decoupled sytem of equations 
determining HDMR components of g{x) 

^9[xR-g] /-.-.x 
an '■= -m — r (n) 



gm (-^m ) 



^m[XR-g]{x n 

m 



gmny^mi ^n) ■ f TT ^ gm\Xm) gn\Xn) 5*0 

gmmi^'^mi •^m) 0. 

We observe that this system is a generalization of ([7j) for an arbitrary ap- 
proximation domain R = dom(5f) C X. 

A new problem, however, arose as formulae (|8]) are not valid any more 
in this general setting. As we have already indicated, these identities are 
beneficial in Section |3l so we need to readjust all the components g^, gm{xm) 
and gmn{xm,Xn) to Satisfy ([H]). Fortunately, this can be done easily without 
disturbing their optimality. We shift each component by the respective aux- 
ihary constant a©, 0"^, o"mn in such a way that ([8]) holds again. Formally, we 
define 



(Jr. 



gm{Xm) (12) 

m 

5Z 5Z ^" 



ex 



and then 



•^0 : = 



d 1 

^Crm + - ^ 0"mn- (13) 



2 

m=l m,n=l 



Next, we redefine HDMR components as 

gin 

gm (-^m) 



m + (14) 



mn • 
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The values of all a^n and amn determined by (IT^ now ensure the validity of 
dHD, and formula (fT5l) guarantees that the overall shift of values of g{x) is 
nullified, see ([3]), and so f lT^ does not affect the optimality of g{x). 

Even though equations f|TT]) and f|T^ seem to be more complicated, their 
computational complexity is similar to the full domain case ([7]). Therefore, we 
will refer to this more general result throughout this article. When dom{g) = 
X, both these approaches are equivalent. 



3 Fast Minimization of HDMR 

In this section, the main novelty of this article is developed. The key in- 
gredient of the proposed approximate dynamic programming technique is a 
fast approximate minimization of functions in HDMR form. We consider 
function g{x,z), dom(^) = X x Z, having the following structure 



m,n=l m=l m=l n=l 

where we denoted by k and /z the dimension of X and Z, respectively. This 
function corresponds to full HDMR of g{x, z) without all HDMR components 
independent of z. Since we are interested in a point-wise minima of g{x^ z), 

p{x) := mmg{x, z), (16) 

the previous restriction on components of g{x, z) is without loss of generality 
and it considerably eases the notation. 

Regardless of a specific choice of a; G X, the parametrized minimization 
in f[T^ is equivalent to the search for the clique of the minimal weight in 
a complete multi-partite edge- weighted graph [17]. To show it, identify dif- 
ferent Zjn as partite sets of the graph, z^ G Z^ as vertices in particular 
partite set Zm and gmn{,Zm, Zn) as weight of edge between vertices Zm G Z^ 
and Zn G Z„ taken from distinct partite sets with gmm = 0, as we claimed in 
(IHl). The additional weights of vertices gm{zm) and g^+n,m.{,Xn, Zm), the latter 
parametrized by a; G X, can be simply added to the weights of proper edges. 
This problem is known to be NP-hard [18] and as it plays a role of repeatedly 
solved subproblem here, we search only for an approximate solution of f|T6|) . 
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3.1 Problem Reformulation 

At the moment, it is fruitful to rewrite function g{x, z) in a more convenient 
form. For a finite set B and zG{l,...,|-B|}we denote B[i\ the z-th element 
of B. Then, for all m, G {1, . . . , /i} we define matrices F"*" in this way 

i^7:=^^„(Z„[z],Z„[j]). (17) 
In the same manner, we define matrices G"^" 

G'^":=^„„(Z„[^],X„[j]) (18) 

for all m G {1, . . . , /i} and n G {1, . . . , k} and vectors /i™ 

hf := ~g^{Z^[i\) (19) 

for all m G {1, . . . , /i}. Further, we compose all matrices into one matrix 
F with _F™-"' being the mn-th subblock of F . Similarly, we create matrix G 
out of matrices G^"^ and vector h consisting of subvectors /i™. Thus, we 
obtain a concise reformulation of g{x, z) 

\ 

-f{u,v) := -v'^Fv + h'^v + u^Gv, (20) 

where the only question left is to clarify the relation between vectors u, v, 
and the original variables x G X, 2; G respectively. 
We define 

e:=Y,\Zm\ (21) 

m=l 

and follow the logic of the previous construction to deduce the structure of 
the newly introduced vector f G M^. We see that it consists of /i subvectors 

t;"^ G {0,1}I^'"I, (22) 

which are related to coordinates Zm G Zm of the original variable 2; G Z as 

:= 1 z„ = t;f:=0 otherwise. (23) 

The relation of vector u to the original parameter a; G X is analogous. Such 
constructions of v{z) and u{x) guarantee that 

-f{u{x),v{z)) = g{x,z), (24) 

for all {x,z) E X X Z, and thus the evaluation of p{x), see f|T6l) . is fully 
equivalent to minimization of '-f{u{x), v) with respect to all vectors v obeying 
f l23|l . Therefore, the latter problem is also a NP-hard problem. It is, however, 
more amenable to the relaxation technique developed further. 
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3.2 Trust Region Based Relaxation 

We observe that each x G X in (|T6|) yields a different value of parameter 
u in fl20l) while matrix F remains unchanged. Thus, we can afford some 
intensive matrix preprocessing in order to exploit the repetitive nature of this 
minimization. That is why we turn our attention to the trust region problem 
[19j which permits fast exact solution even for an indefinite matrix F. To 
match the form of the trust region problem, we have to relax constraints (123!) 
into 1 1 I'll = r with r > specified lately. Thus, we obtain problem 

min I - v^Fv + h^v + u^Gv \ . (25) 

The only question left is to adjust the diameter r properly. 

We can set = /i immediately, as each feasible vector v of the original 
problem consists of /i subvectors v'^ of unit norm, see (123|1 . Yet there is a 
possibility of obtaining a tighter relaxation. By the definition of matrices F, 
G and vector /i, see ( [T7|) . ( !T8|) and ( !T9|) . respectively, and by zero mean of all 
HDMR components derived in ([H]) and f ll4l) . we observe that the minimized 
criteria in do not depend on the average value of any subvector f of v. 
Hence, we may shift all elements of each by a constant factor and 
thus rewrite constraint fl23l) as 



1 1 
v"^ := I - -— Zm = Zm[i], := otherwise, (26) 

I I I I 

and the value of 7(u, v) remains unchanged. This observation suggests ad- 
justing a slightly smaller diameter r in this manner 



m=l I ^ ' ' ' 1=1 ' ' I m=l 



which corresponds to the norm of any feasible solution satisfying constraint 
( 126|) . Thus, we obtained as tight relaxation of the original problem as possible 
and we are ready to solve the trust region problem (!25|) . 

From a wide spectra of solution methods of the trust region problem, see 
[2U] . and references therein, we choose one which is computationally expen- 
sive for a one step minimization, but effective in our repetitive setting. At 
first, we find ortoghonal matrix V such that 

F = U^DU (28) 
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holds with diagonal matrix D having its diagonal composed of all eigenvalues 
ordered from the lowest one to the highest one. Then, for a particular u we 
define 

b:=Uh + UG'^u, (29) 
and we find solution v of fl25|) according to 

V := -U^ {D - XI)-\ (30) 

where I is unit matrix and A G (—00, -Dfcfc) solves one-dimensional equation 



i=k 



with an index of the first non-zero element of b denoted by k G {1, . . . ,6}. 
Then, precisely one such A exists and can, for instance, be computed by the 
Newton's method. A more detailed discussion is to be found in [20l[2T| [T9 | [22]. 

In some practical problems, matrix F in ( |25ll may be zero or may have 
a very small norm. Then, we may either use some different approach, e.g. 
linear integer programming |23], or we may solve fl25l) analytically with the 
optimal choice of v determined by formula 



3.3 Estimate of the Exact Minimizer 

At the moment, we briefly summarize the previous procedure. We related 
v{z) G W to each z e Z hj ([23]), and also u{x) G to x G X in a 
similar manner. Next, we found the exact minimizer -0 G of the relaxed 
problem fl25|l . which is in fact parametrized hy x & X as v = v{u{x)) = v{x). 
Such v{x) generally does not correspond to any feasible solution z E Z of 
the original problem ( !T6|) . Yet, we may still use the knowledge of v{x) to 
estimate the value of p{x). 

First, we easily obtain a lower bound 

p{x) := ■y{u{x),v{x)). (33) 

Indeed, if we compare the derivation of ( l25|) with the original problem (flGl) . 
we realize that ''y{u{x),v{x)) minimizes the same criteria with respect to a 
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larger set. Therefore, we have p{x) < p{x) for all x G X. This lower bound 
p{x) is, however, problematic. It gives only poor estimates on p{x) as we 
show in a numerical experiment in Section 13.41 

On that account, we develop a more accurate upper estimate on p{x) 
now. We simply interpret each value vY^{x) as an indicator of subobtimality 
of the related element Zmli] G Z^- In other words, the higher the element 
v^{x) is, the lower cirteria g{x, zi, . . . , z^) we may expect when adjusting 
to Zm[i]- One can came up with many different ways of such "rounding" of 
v[x) to some z G and thus there is not any guarantee that the following 
heuristic is the best possible. 

From now on, we again omit parameter x G X in the notation for the 
sake of simplicity. We start with normalizing vector v in two setps. We shift 
it to be non-negative 

V := V — min Vi, (34) 

iG{l,...,0} 

and then we rescale all its subvectors 5™, m G {1, . . . , /i}, as follows 

t)" := {)" / max {)™. (35) 

i(i{l,...,\Zm\} 

Thus, for all m there is at least one element of ■O™ equal to 1, and for all 
i G {1, . . . , \Zm\} it holds that v"^ G [0, 1]. Further, we define function q{z) 
indicating the quality of a particular z E Z (with respect to an implicit 
parameter x G X) 

?(^) ■■= n where z = {Z,[t,], . . . , ZjzJ) . (36) 

m=l 

From non-negativity of v we observe that q{z) G [0, 1], and the maximum of 
q{z) with respect to z E Z is equal to 1 by fl35|) . Then, we define 



Z"^ ■={zeZ : q{z) > 0} (37) 

for any G [0, 1]. Thus, Z^ = Z and Z^ contains only such z & Z that all the 
corresponding -0™ are maximizers used in the denominator in (135|) . We note 
that Z'^ can be enumerated in a component-wise manner using (15^ without 
passing the whole Z. Then, we substitute Z^ <Z Z for Z in ( 1TB]) . and we find 
an upper bound p'^(x) on p{x) 

p'^{x) := min^(x, z), (38) 
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by enumerating g{x,z) for all z & Z"^. The lower the value of we choose, 
the larger the Z*^ that we obtain and the tighter the upper bound p'^(x) we 
find; nonetheless, at the price of slower enumeration in fl38|) . 

Once the diagonalization in (l28l) is done, it is in fact easy to compute 
p'^(x) for any x E X. We construct u{x) by the one-to-one correspondence 
f l2^ . then we compute vector b{x) according to fl2^ . find the related value 
of X{x) following (EI]), and finally calculate candidate v{x) which enters the 
already introduced procedure that leads to p'^{x) defined by (138|) . Thus, 
we found approximate minima of a general function g{x, z) in HDMR form 
over z E Z for all parameters x G X. This permits us to apply HDMR to 
effectively approximate the Bellman equation in Section HI 

3.4 Minimization of a Random Function 

Now, we dedicate a short section to a numerical verification of the previously 
introduced technique. We solved problem (fT6ll exactly for a random function 
g{x,z). For the sake of simplicity, we omitted parameter x and set G = 
in (l25l) . Next, we choose the minimization domain Z = {1, . . . , 150}^, 
we generated HDMR components gmn randomly with values chosen from 
uniform distribution on interval [0, 1] and finally we adjusted them to satisfy 
(jS]). Then, we found a lower estimate on minima p according to ( l33i) and 
upper estimates on minima p'^ for various choices of parameter (j) following 
(l38ll . All results were averaged with respect to 20 random samples of F and 
h and depicted in Fig. [H The relative error of upper bound p'^ is defined 
as the distance from minimum of g{z) rescaled and shifted in such a way 
that the exact minimum corresponds to whereas the average value of the 
minimized criteria corresponds to 1. We observe that the lower the value 
of is, the better the approximation we obtain as we expected. On the 
other hand, there was a linear grow of logdZ"^!) when decreasing 0. We 
suppose that a detailed elaboration of this relation could serve as a basis 
for an error estimation heuristics. Concerning the lower bound, we obtained 
p = —5.55 holding the same scale as previously, whereas the worst upper 
bound p^ = 0.47 is almost 12 times closer to the exact minimum 0. As both 
have similar computational complexity, we omit lower bound estimate p from 
further considerations. 

These experiments were carried out on CPU Intel Core 13, 2.10 GHz with 
4GB of RAM in Matlab 7. It took 169 seconds to find the exact minimum, 
whereas the average time necessary to diagonalize matrix F was 1.3 seconds. 
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Figure 1: Value of log(|2'''^|) and a relative error of p'^ plotted against various 
values of 0. The relative error is the distance of p'^ from the minimum of 
g{z) rescaled and shifted in such a way that exact minimum corresponds to 
whereas the average value of the minimized criteria correspods to 1. The 
depicted results were averaged over 20 different realizations of matrix F and 
vector h 

We note that this matrix diagonalization is done only once in the full setting 
of (fT6|) . whereas the time necessary for exact minimization of g{x^ z) for each 
X G X is still the same. 

4 Approximate DP based on HDMR 

This is the right time to briefly introduce the decison-making theory. A decision- 
making task stands for selecting a decision-maker's strategy in order to reach 
his aim with respect to the part of the world (so-called system). The decision 
maker observes or influences the system over a finite decision making horizon 
T < oo. Value ?/j G j/t, t G T = {1, . . . , r}, provides the decision maker with 
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all the knowledge influencing the future behaviour of the system. Thus, yt 
includes the current state of the system together with other external data 
observed up to time instant t. Nonetheless, we will reference yt simply as 
a state of the system. Next, the decisions (actions) of a decision- maker are 
denoted as G Af. A strategy is a collection of mappings of the current 
state yt-i G Yt-i into the choice of the next decision at € At, for the optimal 
strategy we use symbols {at{iit-i)}t&T- To formalize the decision- maker's 
aims, a concept of the additive loss function is used, lt{at,yt), depending on 
the current action at and system state yt- The involved system is described 
in a probabilistic manner by the following collection of pdfs called the outer 
Markov model of a system 

{ft{yt\at,yt-i)}t^T- (39) 
For the expected value of variable x conditioned by y we use 

£[x\y]:= / xf{x\y)dx. (40) 
Jx 

Knowing the collection of loss functions {lt{at,yt)}l=i together with the sys- 
tem model ( 139|) . the optimal strategy {at{yt-i)}teT is fully determined by the 
Bellman function 



Vt-iiyt-i) = min S[lt{at,yt) + Vt{yt)\ at, yt-i] 

ate At 



(41) 



which has to be recursively evaluated at all times t eT with the boundary 
condition = 0. As this standard form of the Bellman equation ( 14 ip is not 
convenient to our purposes, we rewrite it in an equivalent form 



Et{at, yt-i) = S 

Er+l = 0. 



k{yt,at)+ min Et+i{at+i,yt) 

at+i£At+i 



(42) 



Then, Et+i{at+i,yt) is the expected loss-to-go provided we choose action 
at+i in the system state yt- In this setting, the optimal strategy dt{yt-i) is 
composed of actions satisfying 



atiyt-i) ■= &TgmmEt{at,yt~i] 

ate At 



(43) 
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4.1 Offline Part - Approximate Evaluation of Et 

Now, we are prepared to apply both HDMR developed in Section [2] and fast 
approximate minimization of functions in HDMR form, see Section [3l to ef- 
fectively approximate Et{at,yt-i) defined by fH2l) . This part of algorithm is 
the most demanding concerning the computational complexity. Thus, func- 
tion Et{at,yt-i) is typically computed offline, stored as a look-up table (in 
our case in HDMR form) , and then used during the online part of a decision- 
making algorithm to find the approximated optimal action by using (H3|l . 
The proposed algorithm runs in the backward manner analogously to the 
evaluation of the exact Bellman equation fHT]) . 

We denote the approximated loss-to-go function by Et even though for 
t < r it is not the exact HDMR of E^ . For the first step, t = r, we rewrite 
(iSD as 

Er{ar,yr-l) = S[lr{yr,ar) \ Or, Z/r-l ] ■ (44) 

To obtain all HDMR components E^-^q, Et^^, ET-^rnn of -^^(ot, l/r-i)) "we eval- 
uate i?^(a^, ?/^_i) for each pair (cir,l/r-i) ^ A^- x Y^-i and add the resulting 
value to proper sums in (III]) . 

Next, suppose we know all Et+i^, Et+i^m, Et+i^mn and we want to find an 
approximation of Et in the form of HDMR. Substituting Et+i into ( 142|) we 
have 



Et{at,yt-i) ~ £ 



hiyt,at)+ min Et+i{at+i,yt)) 



a-uyt-i 



(45) 



This suggests defining Et as HDMR of the expression on the right-hand side, 
or at least as HDMR of some approximation of this expression. On that 
account we denote 

■Kt{yt):= min Et+i{at+uyt) (46) 

and search for its upper bound 7if{yt) following the instructions of Section 
[3l The choice of an auxiliary parameter G [0, 1] determining the precision 
of the upper bound estimate is discussed at the end of this section. Looking 
at (fT6l) . we identify g = Et+i, X = yt and Z = At+i. We note that all the 
HDMR components of Et+i that depend only on yt may be directly inter- 
changed with minimization in (1461) and thus not considered at the moment. 
Based on the knowledge of such Et+i^, Et+i^m and Et+i^mn that depend on 
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at+i, we construct matrices Ft, Gt and vector ht according to (fT7|) . (fTSI) and 
(IT^ . and we formulate the relaxed problem (123]) . Then, we find its exact 
minimizer Vt{yt) in a direct analogy to fl30l) with matrix diagonalization 

Ft = UfDtUt (47) 

involved. The diagonalized matrix Ft is typically small and does not grow 
much with t as its size fl2T]) corresponds to the space of actions at. Knowing 
^t{yt)i we calculate an upper bound on minimum applying procedure fl55]) . 
and finally we add (restore) all HDMR components of -Ef+i depending only 
on Ut. Thus, we obtained an upper bound on minimum of Titivt)- We note 
that diagonalization (H7|) is carried out just once for each time step t, and so 
we can effectively evaluate 7ff{yt) for all yt G yt- Now, we find Et{at, yt-i) by 
evaluating the right-hand side of the following formula 



Et{at,yt-i) ^ £ 



k{yt,at) + 7itiyt) at,yt-i (48) 



for each pair {at, yt-i) & At x Yt-i and add the resulting value to proper 
sums in f[T^ immediately. Thus, we construct all HDMR components i?i_0, 
Et^m and Et^mn, avoiding the full dimensional representation of Ft. 

Finally, we repeat the whole procedure to recursively compute function 
Et{at,yt-i) for all t & T. Once the calculation of each particular Et is 
finished, we can completely remove all components of Et+i independent of 
at+i non-affecting the suboptimal strategy computed in the next section. 



4.2 Online Part - Approximate Minimization of Et 

The previously described part of the algorithm has to be implemented in 
advance, or "off-line" manner because of high computational demands. As 
functions {Et{at, yt-i)}t<=:T are stored only in the form of HDMR, it is possible 
to take larger decision horizons r into consideration. Nonetheless, we still 
have to choose an approximated (suboptimal) action at in the real time, or 
"on-line" manner. Then, the previously observed system state yt-i is fixed 
and so we solve just one minimization problem in each time step t in opposite 
to the recursive evaluation of (H5|) . Substituting Ft into (H3l) . we define 

at{yt-i) \= &Tgmm.Et{at,yt-i). (49) 

We note that at{yt-i) does not stand for HDMR approximation of at{yt-i) 
defined by 



17 



There are many ways how to find fij, or at least some its approximation. 
An interesting choice can be a trust region based relaxation as we may ex- 
ploit our previous calculations. We may represent HDMR components of 
Et{at,yt^i) in the basis obtained in P7|) . If we store all matrices Ut, Dt, 
and also matrices Gt and vectors ht involved in approximate minimization 
of Titiut) defined by fHB]) . we may find approximate minimizer of in ac- 
cordance with Section [3] again. However, even some more accurate technique 
may be used in one-shot only minimization (H9|) . Any algorithm for binary 
quadratic programming [21] may be applied to solve (H9l) via equivalent re- 
formulation (!20|) constrained by (!23|) . For smaller sets At, we can find even 
exact value of G At by direct enumeration of (149 p . We decided to use 
this most accurate approach in Section 14.31 in order to show the extent to 
which Et{at,yt-i) in the form of HDMR may be compared with exact value 
of Et{at,yt-i)- 

4.3 N-armed Bandit Problem 

As an ilustrative example, we propose here an approximate solution to the 
A^-armed bandit problem, which was extremely important in approximate 
dynamic programming, see for instance fiUi [1] and references therein. We 
compare its exact solution with HDMR based approximation. 

Conceive a game where the player has to choose between different options, 
e.g. levers of A^-armed bandit, with numerical rewards chosen from various 
stationary probability distributions. The payoff probabilities of levers are 
fixed, yet unknown, and thus the player has to estimate them. Then the 
problem is to identify the most winning lever. Even though this problem 
could be formulated easily, it is a real issue for a longer game horizon as it 
is hard to balance exploration and exploitation. Winning in the first round 
does not imply that the player should stick to the same lever as it prevents 
learning of the payoff probability of other levers. 

We considered game with 9-armed bandit and decision making horizon 
of r = 8 steps to be able to compare approximated suboptimal strategies 
with the exact optimal strategy. Using the previous notation, yt stands for 
the observed value (payoff) yt G Y = {0,1} and at denotes the decision of 
a player in each time step t G T = {l,...,r}. The arms of the bandit are 
represented by two-dimensional space of actions, at E A = {1,2,3}^. The 
loss function 

hiyuat) = -yt (50) 
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represents the aim of maximizing the payoff yt in each round of the game. 
Next, we introduce a sufficient statistic Sj, dom(sj) = y x A, which com- 
presses the previous game results in a small vector 

st{y, a) := st-i{y, a) + 5y^^y 5a,,a, (51) 

with 5 standing for standard Kronecker's symbol. Thus, St{y,a) counts how 
many times we observed a value y after selecting an action a in first t rounds 
of the game. We set sq = for the moment. In fact, St may be included into 
the system state yt, but for the sake of simplicity we treat it separately here. 
To compute the expected loss in fH21) . the knowledge of the Markov system 
model (139|) is necessary 

ff \ \ st-iiyt, at) + 1 

ft{yt\at, st-i) = — — — . (52) 

st-i{yt, at) + st-i{l - yt, at) + 2 

This model was obtained using the technique of Bayesian estimation [23]. In 
the following experiment, the 9- armed bandit was simulated using pseudo- 
random generator with fixed payoff probability matrix P defined for a E A 
as follows 

■.= Frohiy=l\a=[t,j]). (53) 

During the experiment, it turned out that high-symmetry of A^-armed 
bandit is unsuitable for our purposes. If the underlying payoff probability P 
is completely unknown, and for the prior information it holds so{y, a) = for 
all ?/ G y, a G A, then all the bandit arms have the same expected loss when 
averaged over all the possible system trajectories. Thus, Ft corresponding 
to differences of the expected loss among various arms is equal to zero. We 
may still use the previously introduced algorithm, see the note near (!32|) . 
but we would miss its most interesting part, i.e. the trust region based 
approximate minimization described in Section 13.21 We note that this high 
level of symmetry is very unlikely for a real-world problem. 

Thus, we decided to slightly perturb the experiment to suppress its sym- 
metry. We put a prior information on one arm, So(0, [1, 1]) = 1, and in this 
setting we computed the exact values of {Et}t£T following f H2]) and also all 
HDMR functions {Ef}t^T according to Section H?T1 This time we explicitly 
stated that Ef depends also on the value of 0, see (1481) . The disk space 
necessary to save {Et}teT and each {Ef}t^T in Matlab .mat file was 2.3 MB 
and 0.1 MB, respectively. The optimal strategy was derived from Et using 
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, and suboptimal strategies parametrized by were derived according to 

All these strategies were used to simulate 20000 plays with 9-armed ban- 
dit, each of them consisting of r = 8 steps. The payoff probabilities Ptj of the 
bandit were chosen randomly from uniform distribution on interval [0, 1] with 
the only exception of fixed payoff probability Pu = 0.1 corresponding to the 
only non-zero prior so(0, [1, 1]). The average payoff of the optimal strategy 
was 0.653, and the average payoffs obtained for various values of are de- 
picted in Fig. |2l The strategy derived from was rather sucessful, it gained 
0.632 on average. It indicates the practical applicability of the less acccurate 
approximation of Et, when 0=1 and contains typically just one element. 
Then, the whole estimating of the exact minimizer, see Section 13. 3[ amounts 
only to "rounding" of trust region problem minimizer to an approximate 
minimizer of HDMR. The precision of HDMR approximation itself may be 
deduced from the average payoff 0.638 obtained for = 0, which corresponds 
to the exact minimization in P6|) . The closer the is to 0, the closer Ef 
is to Et by its definition. However, this monotonicity does not hold for the 
derived strategies. Yet, on average it holds again, see the interpolated line in 
Fig. [21 The slope of this line is rather small; it means that in this particular 
problem the average payoff just slightly increases when decreasing 0. It is 
in contrast with Fig. [1] where the upper bound estimate depended strongly 
on the minimization precision tuned by parameter 0. Nonetheless, if we find 
upper bound 7rf_i{yr-i) on ( H6l) for various and compare it with exact 
minimizer Trr-i{yT-i), we obtain dependence on similar to that depicted 
in Fig. [1] Thus, we observed better performance of strategies derived with 
close to 1 than we can expect from the quality of upper bound estimates 
on Ef. This may be explained by some sort of systematic error produced 
by approximate minimization. Consider some fixed 0. If all values of Ef 
overestimate (or underestimate) values of E^ by the same number, the ap- 
proximate minimization would give inaccurate results, but both approximate 
and optimal strategies derived from Ef and Ef, respectively, would be the 
same. However, more work has to be done to fully verify this conjecture, 
which is likely to be problem-dependent. 
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Figure 2: Average payoffs obtained from approximated strategies derived 
from Ef for various G [0, 1]. Tlie average payoff of tfie exact optimal strat- 
egy derived from Et was 0.653. Tliese results are based on 20000 simulated 
plays with 9-armed bandit, each of them consisting of r = 8 steps. The payoff 
probabilities Pij of the bandit were chosen randomly from uniform distribu- 
tion on interval [0, 1]. The only exception was payoff probability Pu = 0.1, 
which was kept fixed to avoid complete symmetry of the problem as discussed 
in Section 14.31 
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5 Conclusion 



The aim of this work was to cope with both computational and memory 
demands necessary to find and represent the optimal decision making strat- 
egy. The proposed variant of approximate dynamic programming based on 
HDMR is appealing for two reasons. At first, this approximation consid- 
erably reduces memory demands, but, more importantly, it also enables a 
fast approximate minimization of the approximated Bellman function. Re- 
sults of numerical simulation proved that the proposed variant of dynamic 
approximate programming is a viable technique. 

As for all the approximate methods surveyed at the beginning of Section 
m the one proposed in this article cannot be assigned to any of these classes 
directly. It is based on the Bellman function approximation; however, looking 
at its internal structure it may be considered also as an aggregation method 
where each HDMR component aggregates a different coordinates. Next, the 
point- wise construction of HDMR resembles the learning phase of the artifical 
neural networks, yet it is more straightforward. 

A bottleneck of the proposed approximation technique is the fact that it 
still needs to pass through the whole decision tree. Nonetheless, it can easily 
be parallelized, or randomly sampled HDMR may be used [25], or some 
reinforcement learning algorithm that aims at this problem can be applied. 
The fact that HDMR enables a fast approximate minimization would still be 
worthwhile. 

The author would like to express his gratitude to RNDr. Ondfej Pangrac, 
Ph.D., for inspiring discussion about discrete optimization, to Irena Dvofakova, 
prom, fil., for significant help with the language of the manuscript, and finally 
to Ing. Vaclav Smidl, Ph.D., for constructive criticism and encouragement. 
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